Toward Optimal Unsupervised Phoneme Segmentation -A Theoretical and Experimental Investigation
نویسندگان
چکیده
あらまし 音素セグメンテーションは,音声認識や音声合成における基本的な問題である。しかしながら,言語情報 や音響モデルに関する知識を全く用いない教師なし音素セグメンテーションは,非常に難解な問題として挙げられる。 その本質的問題は「どうのように最適な分割を定義する か」である。本論文では,最適な分割を確率的な枠組みで定 式化する。統計分析と情報理論を用いて、最適化対象として三つの目標関数を提案する:Mean Square Error (MSE), Log Determinant (LD) and Rate Distortion (RD)。特に RD関数は、情報レート歪み理論に基づいて定義されてお り、人間の言語知覚メカニズムと関連性を見いだすことができる。さらに,RD関数を用いて,最適な分割が直交変換 に対して不変性をもつことを証明した。また,提案された目的関数を最適化するため、時間制約付きの agglomerative clustering アルゴリズムを使用した。そこでは、積分関数を使用することによって効率的なアルゴリズムの実装手法を 提案した。 本実験では,TIMITデータベースを用いて,提案した目標関数の評価実験を行なった。 Rate Distortion が最良の音素検出性能を示し (recall rate 79.1% in 20ms tolerance windows),それは近年発表された教師なしセグメ ンテーション手法 [1], [4], [5]と比較して,より良い結果を示している。 キーワード 教師なし音素的セグメンテーション, 最適化、レート歪み
منابع مشابه
Unsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملUnsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملUnsupervised Phoneme Segmentation Using Mahalanobis Distance
Abstract One of the fundamental problems in speech engineering is phoneme segmentation. Approaches to phoneme segmentation can be divided into two categories: supervised and unsupervised segmentation. The approach of this paper belongs to the 2nd category, which tries to perform phonetic segmentation without using any prior knowledge on linguistic contents and acoustic models. In an earlier wor...
متن کاملUnsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context
Unsupervised segmentation of phoneme sequences is an essential process to obtain unknown words during spoken dialogues. In this segmentation, an input phoneme sequence without delimiters is converted into segmented sub-sequences corresponding to words. The Pitman-Yor semi-Markov model (PYSMM) is promising for this problem, but its performance degrades when it is applied to phonemelevel word seg...
متن کاملUnsupervised Phoneme Segmentation Using Transformed Cepstrum Features
One of the basic problems in speech engineering is phoneme segmentation, that is, to divide a speech stream into a string of phonemes. Automatic Speech Recognition (ASR) models often require reliable phoneme segmentation in the initial training phase, and Text-to-Speech (TTS) systems need a large speech database with correct phoneme segmentation information for improving the performance. Human ...
متن کاملMetric learning for unsupervised phoneme segmentation
Unsupervised phoneme segmentation aims at dividing a speech stream into phonemes without using any prior knowledge of linguistic contents and acoustic models. In [1], we formulated this problem into an optimization framework, and developed an objective function, summation of squared error (SSE) based on the Euclidean distance of cepstral features. However, it is unknown whether or not Euclidean...
متن کامل